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Abstract 

Debugging lazy functional programs poses serious challenges. In 
support of the "stop, examine, and resume" debugging style of 
imperative languages, some debugging tools abandon lazy evalu- 
ation. Other debuggers preserve laziness but present it in a way that 
may confuse programmers because the focus of evaluation jumps 
around in a seemingly random manner. 

In this paper, we introduce a supplemental tool, the algebraic 
program stepper. An algebraic stepper shows computation as a 
mathematical calculation. Algebraic stepping could be particu- 
larly useful for novice programmers or programmers new to lazy 
programming. Mathematically speaking, an algebraic stepper ren- 
ders computation as the standard rewriting sequence of a lazy A- 
calculus. Our novel lazy semantics introduces lazy evaluation as a 
form of parallel program rewriting. It represents a compromise be- 
tween Launchbury's store-based semantics and a simple, axiomatic 
description of lazy computation as sharing-via-parameters, a la Ar- 
iola et al. Finally, we prove that the stepper's run-time machinery 
correctly reconstructs the standard rewriting sequence. 

Categories and Subject Descriptors D.3.1 [Formal Definitions 
and Theory]: Semantics; D.3.2 [Language Classification]: Ap- 
plicative (functional) languages; D.3.3 [Processors]: Debuggers 

General Terms lazy programming; debugging and stepping; lazy 
lambda calculus 

1. How Functional Programming Works 

Hughes (1989 ) explains why functional programming matters. By 
"functional programming" Hughes specifically means "lazy" func- 
tional programming, and by "matters" he refers to the distinct ad- 
vantages of laziness for programming with reusable components, 
i.e., functions and programs. Hughes's examples demonstrate the 
ease of creating functions and programs by gluing together existing 
functions and programs. The key advantage of laziness is that even 
if one component appears to produce too much data, the laziness 
of the consuming component almost always naturally eliminates 
the superfluous data production. Since then, numerous researchers 
have extended this argument with examples of their own, usually 
accompanied by elegant code in Haskell. 

Unfortunately, laziness also increases the distance between the 
programmer and the underlying machinery. Specifically, laziness 
reduces a programmer's ability to predict when certain expressions 



are evaluated during program execution. As long as things work, 
this cognitive dissonance poses no problems. When a program 
exhibits erroneous behavior, however, programmers are often at 
a loss. A programmer can turn to a debugger for help, but the 
evaluation of lazy programs is often confusing enough that lazy 
debuggers resort to hiding laziness from the programmer in order 
to display useful in formation (W allace et "aTT||2001 1 |Ennals and] 
|Peyton Jones|2003a||Airwood et al.|2d09] T 

An ideal debugger should not modify the execution model of a 
program. The maintainers of the Glasgow Haskell Compiler i Pey-| 
|ton Jones et al.| 19 92 ) share this sentiment, since the bundled GHCi 
debugger abides by this ideal. The authors of the GHCi debug- 
ger JMarlow et al.|[2007] l state that their debugger "lets the pro- 
grammer see the effects of laziness," and therefore, "shows the 
programmer what is actually happening in their program at run- 
time". Unfortunately, the authors also acknowledge that their de- 
bugger presents lazy computations in a way that is difficult to fol- 
low, mostly through seemingly random jumps from one place to 
another in the program. 

To mitigate the drawbacks of existing lazy debuggers, we 
propose a supplementary tool, the algebraic stepper. PLT's Dr- 
Racket ( |Findler et aL||20"02") comes with such a stepper for the 
call-by-value Rackejj language. Given a functional program, the 
algebraic stepper displays its standard reduction sequence in the 
by- value A-calculus (Clem ents et al.| | 2001| l. Our experience sug- 
gests that this kind of tool especially benefits novice programmers 
when they try to understand small programs, and programmers who 
are new to the language and are trying to explore some linguistic 
feature. 

DrRacket's stepper presents evaluation of a program directly as 
a manipulation of the source, similar to the calculated manipula- 
tions of a student of mathematics. We conjecture that this model 
is suitable for addressing the confusing nature of lazy evaluation. 
In this paper, we present: (1) our stepper for Lazy Racket ( jBarzi-| 
|lay and Clem ents 2005}; (2) its underlying semantics, a novel lazy 
A-calculus; (3) and a proof that the stepper correctly implements 
the standard rewriting semantics of the calculus. Roughly speak- 
ing, Lazy Racket is a call-by-need language that uses the same 
evaluation mechanism as Haskell^A Lazy Racket module macro- 
expands its surface syntax into a plain Racket program enriched 
with appropriate delay and force constructs I Hatcli ff and Danvy] 
1997 1. Lazy Racket is mostly used in educational settings — where 
we have tested the prototype of the stepper so far — though some 
programmers have found it useful to construct parser combinators 
or game trees in Lazy Racket modules, and then to export such 
pieces to plain Racket modules. 

The novel lazy A-calculus introduces the idea of selective par- 
allel reduction to simulate shared reductions. On the one hand, it is 
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1 DrRacket and Racket were formerly known as DrScheme and PLT 
Scheme, respectively. 

2 The implementation of Lazy Racket is similar to SRFI 45 of the Scheme 
Standard. 
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nearly trivial to prove an equivalence between the standard rewrit- 
ing semantics of our calculus and a Launchbury-style, store-based 
semantics ( |Launc hbury 1993 1. On the other hand, the calculus is the 
appropriate basis for a correctness proof of the stepper. For the cor- 
rectness proof, we construct a model of the delay-and-f orce im- 
plementation, further enriched with continuation marks l |Clements] 
|et al.||200T| >, show that it bisimulates the standard rewriting se- 
mantics, and finally, exploit Clements's strategy for the rest of the 
proof (Clements 2006). 

Section 2 briefly introduces Lazy Racket and its stepper. Sec- 
tion 3 presents a novel lazy A-calculus and its essential theorems. 
Section 4 summarizes the essential details of the implementation 
of Lazy Racket and presents a model of the lazy stepper. Section 5 
presents a correctness proof for the stepper and the penultimate sec- 
tion summarizes related work. 

2. Lazy Racket and Its Stepper 

Lazy Racket programs are sequences of defines and expressions 
that usually refer to the definitions. Here is a basic example: 

(define (f x) (+ x x) ) 
(f (+ 1 (+ 2 3))) 

A programmer invokes the Lazy Racket stepper from the DrRacket 
IDE. Running the stepper displays the reduction sequence for the 
current program. Figure [T] shows a sequence of screenshots step- 
ping through the above program, with each shot displaying one 
reduction step. A green box highlights the redex(es) on the left- 

Ffe Edit Tabs Help 

< Step | Step > | Jump. . . |to beginning 1/6 

(define (f x) (+ x x)) (define (f x) (+ x x)) 
(f (+ 1 (+ 2 3))) -» ((lambda (x) (+ x x)) 

(+ 1 (+ 2 3) )) 

<Step | Step > | Jump... [to beginning ^ 2/6 

(define (f x) (+ x x)) (define (f x) (+ x x)) 
((lambda (x) (+ x x)) (+ 

(+ 1 (+ 2 3))) (+ 1 (+ 2 3)) 

(+ 1 (+ 2 3))) 

< Step | Step > | Jump. .. |to beginning ^] 3/6 

(define (f x) (+ x x)) (define (f x) (+ x x)) 
(+ _>(+ (+1 5) (+ 1 5) ) ' 

(+ 1 (+ 2 3)) 

(4- 1 (+ 2 3))) 

< Step | Step > | Jump... jto beginning ^ 4/6 

(define (f x) (+ x x)) ^(define (f x) (+ x x)) 
(+ (+ 1 5) (+ 1 5)) (+ 6 6) 

< Step | Step > | Jump. .. |to beginning ^] 5/6 

(define (f x) (+ x x)) ^(define (f x) (+ x x)) 
(+ 6 6) 12 

< Step 1 1 Step > | Jump. . . jto beginning 6, 6 
All of the definitions have been successfully evaluated. 



Figure 1. Lazy Stepper Example 1 

hand side of a reduction step and a purple box highlights the con- 
tractum(s) on the right-hand side. The programmer can navigate 
the reduction sequence in either the forward or backward direction. 
Additional navigation features are in the planning stages. 



In step 2, evaluation of the function argument is delayed so an 
unevaluated argument replaces each instance of the variable x in 
the function body. In step 3, evaluation of the program at the first 
x position requires the value of the argument, so the argument is 
forced in steps 3 and 4. In steps 3 and 4, all shared instances of 
the argument are reduced simultaneously. That is, the stepper ex- 
plains evaluation as an algebraic process using a form of parallel 
reduction. Since the second x refers to the same delayed computa- 
tion as the first x, by the time evaluation of the program requires a 
value at the second x position, a result is already available because 
the computed value of the first x was saved. In short, no argument 
evaluation is repeated. 

A second example introduces the lazy take ! function, which 
extracts the first n elements of a specified list: 

(define (take! n 1st) 
(if (= n 0) 
null 

(cons (first 1st) 

(take! (- n 1) (rest 1st))))) 
(define (f 1st) (+ (first 1st) (second 1st))) 
(f (take! 3 (list 1 2 (/ 1 0) 4))) 

The reduction sequence for this program, as viewed in the stepper, 
appears in figure [2] For space reasons, only interesting steps are 
shown. 

In this example, the result of the take ! computation is the 
argument to the function f. The take! computation extracts the 
first three elements of its list argument, but f only uses the first two 
list elements, so the third element, which produces an error, should 
not be forced. In step 3, the take ! computation is forced because 
both + and first are strict. In Lazy Racket, cons behaves lazily 
and does not evaluate its arguments (Fried man and Wise|1976} , so 
in step 5, the result of the take ! computation is a cons with two 
thunks: one that retrieves the first element of the list, and one that 
contains the next iteration of the take ! computation. In step 7, 
the first addition operand is finally reduced to a value. Notice that 
the first element in the argument to second is already reduced as 
well. The remaining steps force the next iteration of take ! and 
similarly extract the second element of the list. Since only the first 
two elements of the list are needed, no additional take ! iterations 
are computed and the division by zero never raises an error. 

A third example involves infinite lists: 

(define (add-one x) (+ x 1)) 

(define nats (cons 1 (map add-one nats))) 

(+ (second nats) (third nats)) 

More importantly, it involves map, which the stepper has not an- 
notated because it is a library function. The reduction sequence 
for this program appears in figure [3] Again, some function def- 
initions and reduction steps have been elided from the screen- 
shots. In step 1, the evaluation of second forces the map expres- 
sion to a cons containing two thunks. Unlike the second exam- 
ple, the thunks are displayed as <DelayedEvaluation#l> and 
<DelayedEvaluation#2> because their contents are unknown, 
i.e., they were not part of the source program. In step 2, the second 
expression extracts <DelayedEvaluation#l> from the list, but 
the thunk is still unevaluated. In steps 3 and 4, evaluation of the 
program requires the value of <DelayedEvaluation#l>, so it is 
forced. Observe how the stepper updates the nats definition with 
the result as well. The remaining steps show the similar evaluation 
of the other addition operand and are thus omitted. 

As a final example, we use our stepper to understand the behav- 
ior of a program presented by M arlow et al.| ( |2007] >: 
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< Step | Step > | Jump,., |to beginning 



Zi 3/16 



(+ 

(first 
((lambda (n 1st) 
(if (= n B) 
null 

(cons (first 1st) (take! (- n 1) (rest 1st))))) 

3 

(H*t 1 2 (/ 1 B) 4))) 
(second 
((lambda (n 1st) 
(if (= n 9) 
null 

(cons (first 1st) (take! (- n 1) (rest 1st))))) 

3 

(list 1 2 (/ 1 9) 4)))) 




(if (= 3 
null 
(cons 

(first (list 1 2 (/ 1 9) 4)) 
(take! (- 3 1) (rest (list 1 2 (/ 1 9) 4)))))) 
(second 
(if (= 3 9) 
null 
(cons 

(first (list 1 2 (/ 1 9) 4)) 

(take! (-3 1) (rest (list 1 2 (/ 1 9) 4))))))) 




< Step | Step > | Jump... |to beginning 



~3 5/16 



(+ 
(first 
(if false 
null 

( cons 

(first (list 1 2 (/ 1 9) 4)) 

(take! (- 3 1) (rest (list 1 2 </ 1 9) 4)))))) 
(second 
(if false 
null 
( cons 

(first (list 1 2 (/ 1 9) 4)) 

(take! (- 3 1) (rest (list 1 2 (/ 1 9) 4))))))) 



(+ 

( first 

(cons 

(first (list 1 2 (/ 1 9) 4)) 
(take! (-3 1) (rest (list 1 2 (/ 1 9) 4}}») 
(second 
(cons 

(first (list 1 2 (/ 1 9) 4)) 
(take! (- 3 1) (rest (list 1 2 (/ 1 9) 4)))))) 



< Step | Step > | Jump.,. | to beginning 



"3 7/16 



(first (list 1 2 (/ 19) 4)) 

(second 
(cons 

(first (list 1 2 (/ 1 9) 4)) 

(take! (-3 1) (rest (list 1 2 (/ 1 9) 4)))))) 



( + 
1 

(second 

"(cons 1 (take! (- 3 1) (rest (list 1 2 (/ 1 9) 4)))))) 



Figure 2. Lazy Stepper Example 2 



;; [Listof Char] -> [Listof [Listof Char]] 
(define (lines s) 
(cond 

[(null? s) null] 

[else 

(def ine-values (1 t) 

(break (lambda (x) (equal? "\n" x) ) s)) 

(cons 1 (cond 

[(null? t) null] 

; drop "\n" char and recur 

[else (lines (cdr t) )]))])) 

; ; [Char -> Boolean] [Listof Char] 
; ; ->* [Listof Char] [Listof Char] 
(define (break p? 1) 

(let L ([11] [line null]) 
(cond 

[(null? 1) (values (reverse line) null)] 
[(p? (car 1)) (values (reverse line) 1)] 
[else (L (cdr 1) (cons (car 1) line))]))) 

The break function consumes two arguments, a predicate on char- 
acters and a string represented as a list of characters, and splits the 
string at the first character for which the predicate is true, returning 



two substrings simultaneouslyjj The delimiting character remains 
as the first character of the second substring. The lines function 
uses break to separate an input string into lines, where a "\n" 
character begins a new line. Unlike break, the delimiting character 
is not included in the output of lines. 

Evaluating the expression (lines '("\n" "a")) produces 
the expected result, an empty line (the empty string) and a line with 
one "a" character (the ! ! function is a recursive force function): 

> (! ! (lines ' ("\n" "a"))) 
'(() ("a")) 

However, (lines '("a" "\n") ) produces only one line: 

> (! ! (lines ' ("a" "\n"))) 
'(("a")) 

Marlow et al. ( 2007 1 show how to use the GHCi debugger to under- 
stand this behavior. Figure|4]demonstrates that the stepper provides 
a superior vehicle in this situation. It specifically shows how break 
returns two values, which when evaluated, produce a one-character 
string "a" and a one-character string "\n", respectively. From the 
remainder of the definition of lines, we can deduce that the "a" 



3 The values Racket construct enables multiple return values. 
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< Step | Step > | Jump.. 


| to beginning 1/9 


(define nats 


(define nats 


(cons 1 (map (lambda (x) (+ x 1)) (cons 1 <DelayedEualuation#&>) ) ) ) 


(cons 1 (cons <DelayedEualuation#l> <DelayedEvaluation#2>) ) ) 


(+ 
(second 


■+(+ 

(second (cons 1 (cons <DelayedEvaluation#l> <DelayedEvaluation#2>) ) ) 


(cons 1 (map (lambda (x) (+ x 1)) (cons 1 <DelayedEvaluation#&>) ) ) ) 
(third nats)) 


(third nats)) 


< Step | Step | Jump,, 


|to beginnino, T | 2/9 


(define nats 

(cons 1 (cons <DelayedEvaluation#l> <DelayedEvaluation#2>) ) ) 

( + 


(define nats 

^ (cons 1 (cons <DelayedEvaluaticn#l> <DelayedEvaluation#2>) ) ) 
(+ <DelayedEvaluation#l> (third nats)) 


(second (cons 1 (cons <DelayedEvaluation#l> <DelayedEvaluation#2>) ) ) 




(third nats)) 




< Step | Step > | Jump.. 


| to beginning 3/9 


(define nats 


^(define nats (cons 1 (cons (+ 1 1) <CelayedEualuation#2>) ) ) 


(cons 1 (cons <DelayedEvaluaticn#l> <CelayedEvaluaticn#2> ) ) ) 


(+ (+ 1 1) (third nats)) 


(+ <Delayed=valuaticn#l> (third nats)) 


< Step | Step > | Jump.. 


|to beginning 4/9 


(define nats (cons 1 (cons (+ 1 1) <DelayedEvaluation#2>) ) ) 
(+ (+ 1 1) (third nats)) 


^(define nats (cons 1 (cons 2 <DelayedEvaluaticn#2>) ) ) 
(+ 2 (third nats)) 



Figure 3. Lazy Stepper Example 3 



< Step | Step > | Jump,,, |to beginning 


jj 26/42 




( define -values 

(i t) 


(define- 
(1 t) 


■alues 




(cond 


(values 


(reverse (list (car (list "a" ,r \n")))) (list ™\n"») 




(true 

(values (reverse (list (car (list "a" "An") ) ) ) (list "\n' p ) ) ) 
(else 
(L 

(cdr (list "\n")) 


(cons 1 


cond ((null? t) null) (else (lines (cdr t))))) 




(cons (car (list "\n")) (list (car (list "a" "\n" ))))))) ) 








(cons 1 (cond ((null? t) null) (else (lines (cdr t))))) 







Figure 4. Lazy Stepper Example 4 



string is retained, while the "\n" is dropped by the subsequent call 
to cdr, causing the recursive lines call to return an empty list, 
i.e. no lines, thus explaining the missing line. If we want lines to 
output a final, empty line when there is a trailing "\n" in the input, 
the base case must return a list with an empty line, '(()), instead 
of just an empty list. 

3. Lazy Racket Semantics 

Our key theoretical innovation is the novel semantic view of lazi- 
ness displayed in our stepper. Following tradition, we present this 
idea in the context of a A-calculus, A LR : 

e = n | s | b | x \ Xx.e (e e) (p 2 e e) 

| (cons e e) | null | (p 1 e) | (if e e e) 

n£Z, s G Strings, b — true | false 

p 2 = + | — | * | / p 1 = first | rest 

The syntax of A LE is identical to the core of most functional pro- 
gramming languages and includes integers, strings, booleans, vari- 
ables, abstractions, applications, primitives, lists, and a conditional. 

To specify the semantics of A LR , we first extend e by adding a 
new expression: 

e LR = e | e LRi £ £ Labels 

The "labeled" expression, e LRi ' , consists of a tag I and a subex- 
pression e LR . Labeled expressions are not part of the language syn- 



tax but are necessary for evaluation. Rewriting a labeled expres- 
sion triggers the simultaneous rewriting of all other expressions 
that share the same label. Otherwise, labeled expressions do not 
affect program evaluation. The stepper renders labeled expressions 
without the label. 

We require one constraint on labeled expressions in our lan- 
guage: all expressions with the same label I must be identical. We 
call this the consistent labeling property: 

Definition 1. A program is consistently labeled if, for all £i, £2, 

ei R , eif, if e\ Rtl and e^ 2 are two subexpressions in a program, 
and l\ = £ 2 , then e\ R = ef. 

3.1 Rewriting Rules 

To further formulate a semantics, we define the notion of values: 

v = n\s\b\ Xx.e LR I null | (cons e LR " e LR ') \ v e 

Numbers, strings, booleans, abstractions, and null are values. In 
addition, cons expressions where each element is labeled are also 
values. Finally, any value tagged with labels is also a value. 

In the rewriting of A LR programs, evaluation contexts are used 
to determine which part of the program to rewrite next. Evaluation 
contexts are expressions where a hole [ ] replaces one subexpres- 
sion: 

E=[]\(E e LR ) I (p 2 E e LR ) | (p 2 v E) \ (p 1 E) 
(if E e LR e LE ) I E l 
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LHS i — ^lr RHS 





$E 1 ,E 2 ,£ 

s.t. E[] = EtftEaU)'] 
(redex not under label) 


E[] = E 1 [{E 2 []f] 

$E 3 ,Ei,£' s.t. E 2 [] =E 3 [(E 4 []f] 
(redex occurs under label) 




E[((\x.ef) e ef)] 


E[ef{x :=ef 1 ^}] 


E[ef{x := ef^M£ <= E 2 [ef{x := ef 1 '}]} 






fresh i\ 


fresh £1 




E[(p 2 n/n/)] 


E[(S (p 2 ni n 2 ))] 


E[(S ( P 2 m n 2 ))]{£ <= E 2 [{5 {p 2 m n 2 ))]} 


Prim 


£[(cons ef ef)] 


£[(cons ef l1 ef i2 )] 


£[(cons ef l1 ef* 2 )]i£ <= £ 2 [(cons ef* 1 ef i2 )]} 


Cons 


ei R unlabeled or ef unlabeled 


fresh £\, £ 2 


fresh £1, £ 2 




E\(f irst (cons ef eff)] 


E[ef] 


E[ef]i£ E 2 [ef]} 


First 


£[(rest (cons ef ef f)] 


E[ef] 


E[ef]i£^E 2 [ef]} 


Rest 


£[(if tru/ef ef)] 


E[ef] 


E[ef]{£^E 2 [ef]} 


If-true 


E[(i.± false' ef)] 


E[ef] 


E[ef]i£^E 2 [ef]} 


IF-FALSE 



Figure 5. The A LR Reduction System. 



The (E e LR ) context indicates that the operator in an application is 
evaluated before it is applied. The p 1 and p 2 contexts indicate that 
these primitives are strict in all argument positions. The if context 
dictates strict evaluation of only the test expression. Finally, the E 
context dictates that a redex search goes under labeled expressions. 
Essentially, when searching for a redex, expressions tagged with a 
label are treated as if they were unlabeled. 

Evaluation of a A LR program proceeds according to the program 
rewriting system in figure [5] For each possible rewriting step, the 
program in the first column is partitioned into a redex and a con- 
text, and is rewritten to either the program in the second or the third 
column. In the second and third columns, the redex is always con- 
tracted. If the redex does not occur under a label, then it is the only 
contracted part of the program (column two). If the redex occurs 
under a label, all other instances of the label are similarly con- 
tracted (column three). In the third column, the context is further 
subdivided as E[ ] = Ei[(E 2 []) ] where I is the label nearest 
the redex, E\ is the context around the ^-labeled expression, and 
E 2 is the context under label t surrounding the redex. Thus E 2 
contains no additional labels. An "update" function is used to per- 
form the parallel reduction. The update function uses the notation 
e i R {[^ ^= ef} to mean that all expressions in ef immediately un- 
der a label £ are replaced with ef. The function is formally defined 
in figure [6] The last clause in the definition covers all cases not 
included by the preceding clauses. 

The Plr rule specifies that function application occurs before 
the evaluation of arguments. To remember where expressions orig- 
inate, the argument receives a label £\ before substitution is per- 
formed. The notation e LRl represents an expression wrapped in one 
or more labels. During a rewriting step, labels are discarded from 
values because no further reduction is possible. 

Binary primitive applications are strict in their arguments, as 
seen in the PRIM rule. The 8 function interprets binary primitive 
applications and is defined in the standard way (division by results 
in a stuck state). 

The CONS rule shows that, if either argument is unlabeled, both 
arguments are wrapped with labels. Adding an extra label around an 
already labeled expression will not change the rewriting sequence 
of the program because the parallel updating function only uses 
the innermost label. The FIRST and REST rules extract the first and 
second components from a cons cell, respectively, and the IF-TRUE 
and IF-FALSE rules similarly choose the first or second branch of 
the if expression. 



ei 



LR«1 

er • 



(Xx.ei 

/ LR LR 

(ei e 2 



e 2 

LR 

e 2 

LR 

e 2 

LR 

e 3 



= e 2 

= (ei R fife <= ef}) 
= \x.{efl£^ef 
= (ef}£ <= ef} 



(P 



2 LR LR\ 

e-i e 2 ) 



LR LR \ 

cons ei e 2 ) 



e 3 



(p 1 ef)}£^ef 
(if ei R ef ef)}£ <= ef 



4 R } = (P 2 ef}£ 
ef{{£ 

1 R / LR r 

= (cons ei \ 

e-f\ 

= (P 1 ef{l - 
= (if ef{{£ 
efU 



e 3 1 

= C L 3 R J) 

^ef 
^ef 

e L 2}) 

LR Yi 

: e 4 } 
-ef} 
-■ ef}) 



otherwise, ef 



e 2 



Figure 6. Definition of parallel update function. 

Program rewriting preserves the consistent labeling property. 

Lemma 1. If ef is consistently labeled and ef 1 — > L r ef, then 
ef is consistently labeled. 

The rewriting rules are deterministic because any expression e LR 
can be uniquely partitioned into an evaluation context and a redex. 
Thus if ef rewrites to a expression ef, then ef rewrites to ef in 
a canonical manner. 

We can then use 1 — >- LR to define an evaluator: 



eval LR (e) < 



v, if e 1 — » L r V 

_L, if, for all e 1 — 



error, if e 1 — » LR ef, ef 



, LR LR . LR 

*lr ei , Ci 1 ' lr e 2 



v, 



where 1 



flef such that ei R 1 — > LR ef 
is the reflexive-transitive closure of 1 — R 
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3.2 A New Call-by-Need Lambda Calculus 

As is, the i — > LR relation cannot describe the standard reduction se- 
quences of any calculus. Each rule in figure [5] replaces the entire 
program with another, in a non-compositional manner. Put differ- 
ently, the table of relations does not show how a standard reduction 
semantics is created from a basic notion of reduction (Barendregt 
19841, like /?, made compatible with the syntactic constructions of 
a leftmost-outermost context (Felleisen et al. 2009; Plotkin 1975). 

In this section, we sketch how our rewriting semantics relates to 
a plain, yet novel call-by-need calculus. Put concisely, the calculus 
replace s the deref notion of reduction in Ario la and Felleisen| s 
jl997| call-by-need calculuQwith a rule that exploits the function 
parameter for sharing but implements evaluation via substitution. 
The calculus comes with a standard reduction theorem, and it is 
possible to show that our rewriting semantics essentially relates the 
program to its answer via the same steps as the standard reduction 
sequences of the calculus. Due to a lack of space, we merely spell 
out the basic ideas without stating any theorems or proofs. 

For simplicity, we take the syntax of the A-calculus as the 
starting point: 

AF I \ AF I / AF AF\ 

e — x \ Xx.e | (e e ) 

Evaluation of a A AF program terminates when it is reduced to an 
answer a AF : 

AF \ AF 

v — Xx.e 

AF AF I //\ AF\ AF\ 

a =u I ({Ax.a ) e ) 

Programs reduce to answers instead of values because reduction 
does not remove application terms. The specification of a notion of 
reduction relies on the notion of an evaluation context E AF : 



E = 



(E 



AF e AF ) | ((Xx.E AF [x]) E AF ) | ((Xx.E**) e AF ) 



The evaluation contexts, especially the third one, specify that argu- 
ments are not evaluated until they are needed by some variable in 
the function body. 

Here are the three notions of reductions (axioms) from Ariola 
and Felleisen's call-by-need calculus: 

(deref) 

((Xx.E AF [x]) v AF ) = ((A:r..E A > AF ]) v AF ) 

(lift) 

AF\ AF\ AF'\ {/\ / AF AF'\\ AF\ 

(((Xx.a ) e ) e ) = ((Xx.(a e )) e ) 

(assoc) 

{(Xx.E AF [x\) ((Xy.a AF ) e AF )) = ((Xy.({Xx.E AF [x\) a AF )) e AF ) 

The deref axiom substitutes the evaluated argument for the variable 
in the function body. The other two axioms deal with answers that 
may appear on the left-hand or right-hand side of an application. 

One problem is that the deref axiom leaves the application 
alone, even after the argument has been reduced to a value. Clearly, 
doing so contradicts both the natural implementations (which use 
a mix of graph rewriting and memoizing) and the widely used 
Launchbury semantics (which uses a store-based semantics to 
mimic memoizing). To get closer to this semantics, we propose 
to replace the deref axiom with the following firmed axiom: 

((Xx.E AF [x]) v AF ) = E AF [x]{x := v AF } (/3 med ) 

This axiom says that when a parameter occurs in a "demand" 
position — the hole of an evaluation context — and the argument is a 
value, then plain old substitution captures the essence of parameter 
passing in a call-by-need language. 

The new axiom is indeed a proper notion of reduction and is 
applicable in any context. Like Ariola and Felleisen's calculus, our 



revised lazy A-calculus is confluent (satisfies the Church-Rosser 
property) and comes with a Curry-Feys style standard reduction 
theorem. Furthermore, there is a simple bisimulation that relates 
standard reduction sequences to the semantics of figure|5] 

4. Lazy Stepper Implementation 

Figure [7] summarizes the software architecture of our stepper. The 
first row depicts a A LR Lazy Racket rewriting sequence. To con- 
struct this rewriting sequence, the lazy stepper first macro-expands 
a Lazy Racket program into a functional Racket program, enriched 
with delay and force, as mentioned in section[T] In turn, the step- 
per for (eager) Racket annotates the expanded program. Executing 
an annotated Racket program emits a series of output values, from 
which the reduction sequence for the unannotated Racket program 
is reconstructed. Once the lazy stepper has the plain Racket reduc- 
tion sequence, it synthesizes each step to assemble the desired Lazy 
Racket rewriting sequence. 

The correctness of the lazy stepper thus depends on two claims: 

1. The reduction sequence of a plain Racket program can be re- 
constructed from the output produced when evaluating an an- 
notated version of that program. 

2. The rewriting sequence of a Lazy Racket program is equivalent 
to the reduction sequence of the corresponding plain Racket 
program, modulo macro-expansion and synthesis steps. 

The first point corresponds to the work of |Clements et al.| ( |2001} 
and is depicted by the bottom half of figure [7] The second point is 
depicted by the top half of figure[7j The rest of the section formally 
presents the architecture in enough detail so that our stepper can be 
implemented for other programming languages, and so that we can 
prove its correctness. The actual correctness theorem and proof can 
be found in the next section. 

4.1 Racket + delay/force 

When the stepper is invoked on a Lazy Racket program, the source 
is first macro-expanded to a plain Racket program. Racket pro- 
grams are eagerly evaluated, so lazy evaluation is simulated with 
the insertion of delay and force constructs. We model this ex- 
panded language with A DF , a core calculus of functional Racket 
with delay and force: 



I L I I \ RKT 

= n \ s | o | x | Xx.e 

/ RKT RKT\ I 

(cons e e ) \ 

I / . j. RKT RKT RKT 

| (if e e e 
n £ Z, s £ Strings, b = 



I / RKT RKT 

null I (p 
) I (delay e 
true I false 



RKT RKT \ 



e 

RKT \ 



(force e 



p = + 1 - 



/ p — first | rest 



*The calculus of Maraist et al 



1998 is unrelated in this case. 



The syntax of A DF is similar to A L r except that delay and force 
replace labeled expressions. The delay construct suspends evalua- 
tion of its argument in a thunk; applying force to a thunk evaluates 
the suspended computation and memoizes the result. In addition, 
applying force to a suspended computation wrapped in multiple, 
nested thunks forces all the thunks, while applying force to a value 
returns that value. 

The semantics of A D f combines the usual call-by-value world 
with store effects. We describe it with a high-level abstract ma- 
chine, specifically, a CS machine (Felleisen et al. 2009). The C in 
the CS machine stands for control string and the S is a store that 
represents physical memory. In our machine the control string is an 
expression that may contain locations, i.e., references to delayed 
expressions in the store. In contrast to the standard CS machine, 
the store in our machine only holds delayed computations. 
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e 3 h 
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[Lazy Racket] 



[Racket] 



[Annotated Racket] 



Figure 7. Stepper Implementation Architecture 



(E 1 



[((Xx.e 



''-'i 



«f)W 



(S DF [(first (cons «f 
(£ DF [(rest (cons vf v5 f ))],<t) 
(£ DF [(if true ef ef)],^) 
(£ DF [(if false ef ef)],cr) 
<i? DF [(delaye DF )],a> 

(£ DF [(forcef)],cr) 
(£ DF [(force^ t; DF )],cr) 
<£ DF [(force v DF )],a) 
v DF £ Locations 



(S DF [e DF {a; := v DF }],a) 
(E DF [(S(p 2 < v?))],a) 

<£ D >?V> 
(BT[vf\,a) 
{E DF \eT],a} 
<£ DF [ef],a) 

I £ dom(a) 
{E DF [{f orce (force £ ff) 

{E Dt [v Dr ], a) 



Pv 

Prim 
First 
Rest 
If- true 

IF-FALSE 

Delay 

Force-delay 

Force-update 

force-nondelay 



Figure 8. CS Machine Transitions 



Here is the specification of our CS machine: 



*T)DF / DF \ 

5DF T) DF T) DF 

-n , • . • , r„ 

DF r^DFr DF"] 

c =E [e j 
e DF = e RKT | < 

a = ((Ae DF ),...) 

£ G Locations 



(Machine States) 
(Transition Sequences) 

(Control Strings) 
(Machine Expressions) 
(Stores) 

(Evaluation Contexts) 



(E Df e DF ) | (v DF e DF ) 
(p 2 E DF e DF ) | (p 2 v DF E DF ) 
(cons E DP e DF ) | (cons « DF _B DF ) | (p _E DF ) 
(if E DF e DF e DF ) 
(force E DF ) \ (force t E DF ) 
v DP = n \ s \ b \ \x.e Dr ' | (cons v DF v DF ) \ null | £ (Values) 

The store in a machine configuration is represented as a list of 
pairs. In the above specification, ellipses means "zero or more of 
the preceding element". The evaluation contexts are the standard 
call-by-value contexts, plus two force contexts. The first force 
context resembles the force expression in a program and indicates 
that some arbitrary expression is being forced. For the evaluation 
of a specific delayed computation, the second force context is 
used. It remembers a location so the store can be updated after the 
evaluation is complete. Evaluation under a (force I [ ]) context 
corresponds to evaluation under a label in A LR . 



The starting machine configuration for a Racket program e RKT 
is (e RKT , ( )) where the program, in an empty context, is set as the 
initial control string, and the store is initially empty. Evaluation 
stops when the control string is a value. Values are numbers, strings, 
booleans, abstractions, lists, or store locations. Our CS machine 
transitions are in figure [8] Every program e RKT has a deterministic 
transition sequence because the left hand sides of all the transition 
rules are mutually exclusive and cover all possible control strings 
in the C register. 

The p v , Prim, First, Rest, If-true, and If-false transi- 
tions are standard call-by-value transitions. The DELAY transition 
reduces a delay expression to an unused location £. The delayed 
computation is saved at that location in the store. When the argu- 
ment to a force expression is a location, the suspended expression 
at that location is retrieved from the store and plugged into a special 
force evaluation context, as specified by the FORCE-DELAY tran- 
sition. The outer force evaluation context is retained in case there 
are nested delays. The special context saves the store location of 
the forced expression, so the store can be updated with the result- 
ing value, as dictated by the FORCE-UPDATE transition. Finally, the 
FORCE-NONDELAY transition specifies that forcing a non-location 
value results in the removal of the outer force context. 



4.2 Continuation Marks 

A stepper for a functional language needs access to the control 
stack of its evaluator in order to reconstruct the evaluation steps. In 
a low-level stepper implementation, the stepper would be granted 
complete, privileged access to the control stack. As Clements et al. 
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< (wcm of c c 2 M ),a, K CM ,m) 
{v CM ,a, (wcmc™ K™ m),m'} 
((ccm),cr, K CM ,m) 

((output c CM ),a, K CM ,m) 

{v CM ,a, (output K™ m), rri) 

((loc? c CM ),a,K CM ,m) 

{£, a, (loc? K CM m),m') 

(v CM , a,(loc? K CM m),m'} 
v LM ^ Locations 



(cl M ,a, (wcm c!f K cu m),0) 
(c CM ,cr, K CM ,u CM ) 

where « CM = 7r( J R' CM ,m) 
(c CM ,cr, (output K CM m),0) 
(42,(7, A" CM ,m) 
(c CM ,o-, (loc? K CM m),0) 
(true, a, A" CM , m) 
(false, a, 7f CM ,m) 



WCM:EXP 
WCMlVAL 
CCM 

OUTPUT:EXP 
OUTPUT: VAL 
LOC?: EXP 
LOC?-TRUE:VAL 
LOC?-TRUE:VAL 



Figure 9. CSKM Machine Transitions 



(200 lj l argued, however, such privileged access is unnecessary and 
often undesirable. 

Continuation marks are a lightweight stack-access mechanism. 
The stepper for Lazy Racket reuses Clements's stepper for Racket, 
which utilizes continuation marks to reconstruct a program's con- 
trol stack, i.e., the evaluation context. There are two available oper- 
ations for these novel values: 

1 . store a value in the current frame of the control stack, 

2. retrieve all stored continuation marks. 

Using these two operations it is possible to implement a stepper 
without coupling it directly to the evaluator. 

|Clements et al.| ( |2001| > present such a stepper for the eager 
Racket evaluator. The stepper first annotates a source program 
with continuation mark store and retrieve operations at appropriate 
points. Then, at each retrieve point, the stepper reconstructs and 
outputs a reduction step from information stored in the continuation 
marks. Our stepper extends Clements's model with delay and 
force constructs. The annotation and reconstruction functions are 
formally defined in section[3] 

4.3 Racket + delay/force + Continuation Marks 

After a Lazy Racket program is expanded to a plain Racket pro- 
gram, the lazy stepper annotates the plain Racket program with 
continuation mark operations. The language A CM extends A df m a 
stratified manner and models the language for annotated programs. 



(ccm) | (wcm e 



(output e 



(loc? e 



A CM adds four additional kinds of expressions to A dp: wcm, ccm, 
output, and a loc? predicate. When a wcm, or "with continua- 
tion mark", expression is evaluated, its first argument is evaluated 
and stored in the current stack frame before its second argument is 
evaluated. A ccm, or "current continuation marks", expression eval- 
uates to a list of all continuation marks currently stored in the stack. 
When reducing an output expression, its argument is evaluated 
and sent to an output channel. An output expression is evaluated 
only for this side effect, so the result of its evaluation is thus incon- 
sequential. The loc? predicate identifies locations and is needed 
by annotated programs. 

4.4 CSKM Machine 

To model continuation marks, having an explicit control stack is 
helpful, so we convert our CS machine to a CSK machine, where 
the evaluation context is separated and removed from the control 
string in the C register and relocated to a new K register. The con- 
version to a CSK machine is straightfoward and is accomplished 



using known techniques (Fellei sen et al.||2009| >. In addition, we 
pair each context with a continuation mark m, which is stored in 
a fourth "M" register, giving us a CSKM machine. 

For the control stack in the K register, we use an "inverted" 
evaluation context structure, meaning that the innermost context is 
now most easily accessible, giving us a more realistic stack struc- 
ture. This new representation is called a continuation and there is 
a one-to-one correspondence between evaluation contexts and con- 
tinuations. For example, an evaluation context _B CM [([ ] c tM )] be- 
where the _E CM context corresponds to 
continuation. The other evaluation contexts are similarly 
converted. Note the extra continuation mark m associated with the 
continuation. Here are all the continuations K CM : 



comes (appl c™ K CM m 
the K CM 



k cm = 



r/CM \ 

K m) 



. I / j CM r/CM \ I / n CM rrCM \ 

mt | (appl c K m) \ (app2 v K m) 

(prim2-l p2 c CM K CM m) | (prim2-2 p2 v c 

I / j CM r/CM \ i / r\ CM rrCM \ 

(consl c K m) \ (cons2 v K m) 

I / .j i r/CM \ I / . j. CM CM T/CM \ 

(priml pi K m) \ (if Ci c 2 K m) 
| (force K M m) | (force £ K CM m) 
| (wcm c CM K LM m) \ (output K LM m) \ (loc? K La m) 
The configurations of the CSKM machine are: 

V LM — (c CM , cr, K LM , m) (Machine States) 

= r\ , • ■ • , r n (Transition Sequences) 

c LM — e CM | £ (Control Strings) 

CM DF /\r ^ \ 

v — v (Values) 
) (Stores) 
(Mark Register) 

Control strings are again extended to include location expres- 
sions, values are the same as CS machine values, and stores map 
locations to control string expressions. The mark register m is ei- 
ther empty or contains a value. For simplicity, we assume that only 
one mark can be associated with a continuation frame. 

The transitions for our CSKM machine are in figure|9] In order 
to formally model output, we add an extra tag to each transition, so 
our machine operates as a labeled transition system (Keller 19761. 
When the machine emits output, the transition is tagged with the 
outputted value; otherwise, the transition tag is 0. If a transition has 
no output tag, it means the output is inconsequential in the current 
context. In our machine, only an output expression emits output. 
For space reasons, we only include the transitions for the new 
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constructs: wcm, ccm, output, and loc?. The other transitions are 
easily derived from the transitions for the CS machine ( [Felleisen] 
|et al.|2009) . 

The starting configuration for a program e CM is (e CM , ( ), mt, 0) 
and evaluation halts when the control string is a value and the con- 
trol stack is mt. Every program e CM has a deterministic transition 
sequence because the left hand sides of all the transition rules are 
mutually exclusive and cover all possible C and K register combi- 
nations. 

The WCM: EXP transition sets the first argument as the control 
string and saves the second argument in a wcm continuation. In the 
resulting machine configuration, the m register is reinitialized to 
because a new frame is pushed onto the stack. When evaluation 
of the first wcm argument is complete, the resulting value is set as 
the new continuation mark, as dictated by the WCM:VAL transition. 
This new continuation mark overwrites any previous mark. 

The CCM transition uses the n function to retrieve all continua- 
tion marks in the stack. The n function is defined as follows: 

(tt mt m) = (cons m null) 
(ir (appl c™ K CM m) m) = (cons m (it K CM m)) 
(-7T (app2 v c K CM m) m ) = (cons m (n K CM m)) 



^[(force e RKT )] = 
(let* 

(LvO (wcm (list "force") A{e R 
Lvl 

(if (not (loc? vQ)) 
vO 
(wcm 



(list "force") 



(wcm 



(list "force" vO) 



(let* 



( [l)2 (force vO)] ; vO is location 

UO 



( output 



(cons (list "val" | vO \ A Q[v2j) 
(rest (ccm)) I ))]) 



w2))))] 

Itl (output (cons Q\vl] (ccm)))]) 



vl) 



(?r (loc? K c 



(cons m (it K cm m)) 



Only the first few cases are shown. The rest of the definition, for 
other continuations, is similarly defined. The OUTPUT: EXP tran- 
sition sets the argument in an output expression as the control 
string and pushes a new output continuation frame onto the con- 
trol stack. Again, the continuation mark register is initialized to 
due to the new stack frame. When the argument is evaluated, the 
resulting value is emitted as output, as modeled by the label on the 
OUTPUT:VAL transition. Finally, the LOC?:EXP, LOC?-TRUE:VAL, 
and LOC?-FALSE:VAL transitions are defined in the expected man- 
ner, producing true if the loc? predicate is applied to a location, 
and false otherwise. 



5. Correctness 

Unlike most IDE tools, an algebraic stepper comes with a concise 
specification: the A LR rewriting system. Specifically, the stepper 
displays A L r rewriting sequences after removing all labels from the 
terms. It is therefore relatively straightforward to state a correctness 
theorem for the stepper, assuming a function £ that strips an A LR 
program of its labels. 

Theorem 1 (Stepper Correctness). If the stepper displays the se- 
quence e, CI e i R ]) ■ • • i C [ e n R ] for some Lazy Racket program e, then 
p I v. p lh „ . , , v. „lr 

The statement of the theorem's conclusion involves multistep 
rewriting because some rewriting steps, such as CONS, merely add 
labels and change nothing else about the term. 

The proof of the theorem consists of two distinct steps. First, we 
show that the output of a macro-expanded, annotated Lazy Racket 
program uniquely describes the execution of a macro-expanded 
Lazy Racket program without annotations. That is, we retrieve a 
machine reduction sequence at the level of Racket with delay and 
force. Second, we prove that this reduction sequence is equiva- 
lent to the rewriting sequence of the original Lazy Racket program, 
modulo label assignment. The following two subsections spell out 
the two lemmas and present proof sketches. The proof of theo- 
rem Q] combines the two main lemmas from these subsections in 
a straightforward fashion. 



«4[(delay e* KT )] = 
(let* 

(KO (output (cons Q [(delay e RKT )] (ccm)))] 
It (alloc)] 
[tl (output 



(cons (list "loc" t Q[e RKT ] )(ccm)))]) 
(delay A\e™\)) 



Figure 10. Annotation function for delay and force. 



5.1 Annotation and Reconstruction Correctness 

To state the correctness lemma for the CSKM machine, we need 
two functions. First T : S CM -> w? M , 
CSKM steps and produces the trace of output values: 



I i + l 



e CM annotates a plain Racket program and 
S DF reconstructs a CS machine transition 



Tl 



Second, A : e RK 
sequence for a Racket program. 

Lemma 2 (Annotation/Reconstruction Correctness). For any Racket 
program e RKT , if{e RKT , ( )) i— > cs ■ • ■ i— -> cs P DF , then 

ft[T[(.4I e RKT ],( W,0) 



()> 



' CSKM ' 
-P™ 



Our annotation and reconstruction functions extend the func- 
tions of Clements et al. (20011. Figure s [To] and[TT| summarize these 
additions. We omit the parts defined by Clements et al. and instead 
review the functions with some illustrative examples. 

Annotation adds output expressions and continuation mark 
wcm and ccm operations to a program. For example, annotating the 
program (+ 1 2) results in the following annotated programrl 



5 For clarity, the syntactic sugar let* and list forms are used. They are 
defined in the standard way. Other minor code-readability improvements 
have also been made. 
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CM 



CM CM nDF 

TZ : «i ,...,d„ -> 6 
= . . . , (^[(rest v?)][Hc [(first „«) 

/n T CM CMT1\ 
T~, CM , tjiDF 

7?.£;[(cons (list "force") « CM )] = (force 7?.b[u cm ]) 
K B [(cons (list "force" I) v cu )\ = (force £ n E \v CM f) 

-T-, CM , DF 

Tec : v — >• e 
7l c [(list "val"^« DF )l = Q _1 K F ] 
Tec [(list "loc"^« DF )J =£ 

otherwise, IZc [w DF ] = Q~ [v DF ] 

CM CM 

TZs ■■ v 1 ,...,«„ -> cr 
fts[(cons (list s £ w DF ') v DF ), v° r est , . . .] = 

(cons (AQ _1 K F ']) KsKLivl) 
if^dom(^ s K DF st ,...]) 

if£Gdom(^ s K F st ,...]) 

^ S [« DF ,U° F 3 i, ■•■] = TcsbrLt, ■ ■ ■] 

(first u DF ) / (list s£u DF ') 
Figure 11. Reconstruction function for delay and force. 



(let* 

(BO (output (cons Q[(+ 1 2)] (can)))] 
Ivl (+ 1 2)] 

Itl (output (cons Q[«l] (can)))]) 
vl) 

Annotated programs utilize the quoting function Q, which con- 
verts an expression into a value representation. For example 
Q[(+ 1 2)] = (list " + " 1 2). There is also an inverse function, 
Q _1 , for reconstruction. The above annotated program evaluates to 
3, outputting the values Q[(+ 1 2)] and Q[3] in the process, from 
which the reduction sequence (+ 1 2) — > 3 can be recovered. The 
(ccm) calls in the example return the empty list since no continua- 
tion marks were previously stored, i.e., there were no calls to won. 
There is no need for wem annotations because the entire program is 
a redex, i.e., the context is empty. 

Extending the example to (+ (+ 1 2) 5) yields 

(let* 

(LvO (won (list "prim2-l" " + " 5) 
(let* 

(KO (output (cons Q[(+ 1 2)] (ccm)))] 
Ivl (+ 1 2)] 

[tl (output (cons Qfvl] (ccm)))]) 
vl))l 
[v2 (+ vO 5)] 

[t2 (output (cons Q{v2\ (ccm)))]) 
v2) 

This extended example contains the first program as a subexpres- 
sion; and therefore the annotated version of the program contains 
the annotated version of the first example. The (+ 1 2) expres- 
sion now occurs in the context (+ [ ] 5) and the won expression 



stores an appropriate continuation mark so this context can be re- 
constructed. The "prim2-l" label indicates that the hole is in the 
left argument position. The first output expression now produces 
the output value (list Q[(+ 1 2)] (list "prim2-l" "+" 5)), 
which can be reconstructed to the expression (+ (+ 1 2) 5). Re- 
constructing all outputs produces (+ (+ 1 2) 5) — > (+ 3 5) — > 8. 

Storing context information in continuation marks also enables 
the reconstruction of a machine state, which is what the stepper 
actually does, and the reconstruction and annotation functions de- 
fined in figures [10] and [TT| demonstrate how this works. Figure [T0| 
shows that if the subexpression e RKT of a force expression does not 
evaluate to location, the annotations are like those for the above ex- 
amples. If e RKT produces a location, additional continuation marks 
(figure [TU] boxes 1 and 2) are needed to indicate the presence of 
force contexts during evaluation of a delayed computation. An ad- 
ditional output expression (box 3) is also needed so that the steps 
showing the removal of both the (force [ ]) and (force i [ ]) con- 
texts can be reconstructed. Note the (rest (ccm)) (box 5) in the 
first output; this ensures the (force £ []) context is not part of 
the reconstructed control string. The location ?;0 (box 4) is included 
in the output so the store can be properly reconstructed. The "val" 
tag directs the reconstruction function to use the value Q[«2] from 
the emitted location- value pair for reconstructing the control string. 

The annotation of a delay expression requires predicting the 
location of the delayed compuation in the store. We therefore as- 
sume we have access to an alloc function that uses the same 
location-allocating algorithm as the memory management system 
of the machine^Jln addition to the location, the delayed expression 
itself (box 6) is also included in the output, to enable reconstruc- 
tion of the store. The "loc" tag directs the reconstruction function 
to use the location from the emitted location-value pair for recon- 
structing the control string. 

The reconstruction function in figure [TT] consumes a list of 
values, where each value is a sublist, and reconstructs a CS machine 
state from each sublist. Again, only the cases involving delay 
and force are d efined. The rest of the function is borrowed from 
Clements (20061. The first element of every v™ sublist represents 
a (quoted) expression that is plugged into the context represented 
by the rest of the sublist. The store is reconstructed by retrieving 
all the location-value pairs in all the sublists up to the current 
one. The arguments to the store-reconstruction function TZs may 
contain duplicate entries for a location, so a location-value pair 
is only included in the resulting store if it does not occur in any 
subsequent arguments. 

Lemma\2\Proof Sketch. The proof of lemma [2] extends Clements's 
proof (Clements 2006, Section 3.4) with cases for delay and 
force. Also we modify the argumentation for the original cases 
to cope with the additional store, but doing so is straightforward 
because the store is simply threaded through. □ 

5.2 Lazy Racket Correctness 



The function ip 



macro-expands a Lazy Racket program. 



Because source terms don't include labels, ip is undefined for la- 
beled terms. Its partial inverse tp" 1 : c DF x a — ¥ e synthesizes an 
unlabeled Lazy Racket program from a (CS machine representation 
of a) plain Racket program. The expansion and synthesis functions 
are defined in figures[T2]and|13| respectively. 

Lemma [3] states the correctness of Lazy Racket in terms of the 
functions ip, p^ 1 , (,, and the A LR rewriting system from section [3] 
That is, every CS machine transition sequence has an equivalent 
A L r rewriting sequence, modulo p" 1 and £. 

6 Since labels are not displayed as numbers but as sharing among expres- 
sions, this unrealistic mathematical assumption is acceptable. 
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p : e — > e 
pfXx.ej = Xx.ip[ej 



V[(ei e 2 )] 
pl(p 2 ei e a )] 
Vj[(cons ei e2)] 

y>[(if ei e 2 e 3 )] 
otherwise, </j[e] 



((force <p[ei]) (delay ^[e 2 ])) 

(p 2 (force </?[ei]) (force v?[e 2 ])) 

(cons (delay <p[ei]) (delay v?[e 2 ] 

(p 1 (force v?[c])) 

(if (force <p|[ei]) </?[e 2 ] ¥?[e 3 ]) 

e 



Figure 12. Macro-expanding Lazy Racket. 

Lemma 3 (LR Correctness). For all Lazy Racket programs e and 
Racket programs c DF such that (y[e|, ( )) i — » C s (c DF ,o), fftere 
exists a Lazy Racket program e LR ™c/i //iaf e i — » LR e LR and 
^- 1 [c DF ]a = Cle LR ]. 

Proof Sketch. We prove the lemma by induction on the number of 
CS machine steps. For the base case, the lemma holds because 
V _1 [¥'[ e ll( ) — e - Otherwise, we proceed by case analysis on 
the last transition step. 

For each case, we prove correct synthesis of evaluation con- 
texts and redexes separately. When the rewriting of a redex affects 
the context, too — see figure [5] third column — we need information 
about the redex for the synthesis of a context. This parallel rewrit- 
ing of A LR programs is equivalent to the reduction of stored ex- 
pressions in the CS machine if there are multiple references to that 
expression. If the stored expression is a value, the reconstruction 
naturally reifies the value throughout because p>~ x translates lo- 
cations by retrieving and unexpanding the expression at that store 
location. 

If a stored expression is in the process of being reduced to a 
value, however, the store has not been updated but all references to 
this location must reflect the partial reductions. Such intermediate 
steps manifest themselves as reductions under (force £ [ ]) con- 
texts in plain Racket. To synthesize these intermediate steps prop- 
erly, ip" 1 first updates the store with the partially reduced expres- 
sion and then synthesizes the rest of the state into a Lazy Racket 
program. As a result, the translation of all occurrences of a loca- 
tion yield the desired intermediate expression. Only synthesis of 
(force I c DF ) control strings yields a new store; thus only subex- 
pressions that possibly contain (force I c DF ), i.e., subexpressions 
that can contain the redex, yield a new store. 

For synthesis of a j3 v redex, the substituted argument is a lo- 
cation because p> wraps all application arguments in a delay. The 
p>~ x synthesis function translates locations to the delayed expres- 
sion at that location, so the corresponding A LR step is /3lr- The 
Plr step adds a label, which is removed with C,. 

For synthesis of PRIM, FIRST, REST, IF-TRUE, or IF-FALSE 
redexes, p' 1 yields a PRIM, FIRST, REST, IF-TRUE, or IF-FALSE 
Alr redex, respectively, and these rewriting steps insert no labels. 

If the last CS step, (c DF , a) i — > cs (c DF/ , a'), follows DELAY, 
Force-delay, Force-update, or Force-nondelay, there is 
no corresponding A LR rewriting rule and p" 1 [c DF ]a = p" 1 [c DF ']er'. 
The lemma holds since this is equivalent to taking zero steps. □ 

6. Performance 

The performance of a stepper tool must be measured against the 
programmer's ability to use the tool. In particular, raw performance 
numbers are inconsequential because the usability of the tool pri- 
marily depends on its responsiveness to I/O. 



p : c X a — > e 
p^fc^ja = e 

where (e, a') = p>lc DF ja 

p : c DF x a — > (e, a) 
plXx.e DF Ja = {Xx.e, a) 

where (e, a) = (/3[e DF ]<r 

Vl(c°i c 2 F )]cr = ((ei e 2 ),cr") 

where (e\,a ) = ^[c^Ja 
(e 2 ,cx") = ^[c 2 F ]er' 
pl(p 2 c° f c 2 f )](j = ((p 2 ei e 2 ),cr") 

where (ei,cr') = (^[ci F ](j 
(e 2 ,cr") = <p{c% f }<t' 
y>[(cons c° F c 2 F )]cr = ((cons ej e 2 ), a' ) 
where (ei,a ) = (^[ci F ](j 
(e 2 ,cr") = £Ic 2 f ]ct' 

■^[(P 1 c DF )]°" = ((P 1 e),c/) 

where (e, a) = v5[c DF ]<r 

<p[(if c° e e 2 F ef)]a = ((if ei e 2 e 3 ), a) 
where (ei,o~) = ^[c° F ](j 
(en, a') = ^[e 2 F ]cr' 
(e 3 ,a) = iplefjcr' 
<p[(delay e DF )](j = ple DF ]a 

<p\t\a = Phmia 
pl(i orce c DF )](j = i^[c DF ]o" 
y>[(f orce t c DF )]a = (e, a'\i <r- e]) 

where (e,u') = <^|c df ]<t 
otherwise, (^[e DF ](j = (e DF , cr) 

Figure 13. Synthesizing Lazy Racket from plain Racket. 



Nevertheless, it is important to quantify the basic performance 
penalty of a stepper. In our case, the stepper slows down the execu- 
tion of programs by a factor of 20 (up to 60), as the following table 
for a small number of micro-benchmarks shows: 



Test Name 


Slowdown Factor 


f ibo 


21.4 


ack 


32.7 


partial 


27.3 


tak 


23.0 


takl 


34.9 


takr 


55.5 



The numbers describe the performance ratio of annotated versus 
unannotated programs. While they include both annotation and 
evaluation time, the annotation time is negligible when compared 
to the time it takes to run the program. 

We achieve adequate responsiveness of the stepper with a 
straightforward arrangement. As soon as the stepper backend pro- 
duces output, the front-end asynchronously displays reduction 
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steps. By the time the programmer has read and understood the 
first step, the stepper can quickly respond to additional requests. 

7. Related Work 

Debugging programs in lazy languages poses serious challenges, as 
numerous attempts have shown over the past three decades. Most 
recent debuggers are for Haskell. All wood et al.| (2009| > developed 
a stack-trace-generating tool for GHC that aids in identifying the 
context of an error. Their tool transforms programs to carry around 
an extra stack parameter for accessing the program context, similar 
to continuation marks. Their tool approximates a call-by-value 
stack, however, instead of displaying the lazy evaluation order. 

|Ennals and Peyton Jones (2003a) built HsDebug for GHC, 
which employs the "stop, examine, continue" style found in im- 
perative languages, e.g., gdb. With HsDebug, programmers can 
set breakpoints and can examine the program state at breakpoints. 
Like Allwood's debugger, HsDebug does not preserve laziness in 
programs; the debugger must make certain tweaks, resulting in an 
"optimistic" evaluation model (Ennals and Peyton Jones 2003b). 

The most recent version of the GHC system has debugging 
features built into its interpreter (Marl ow et al.|2007) . In principle, 
the GHCi debugger can most closely mimic the operation of our 
stepper because: (1) it allows a programmer to single-step through 
the evaluation of a program, and (2) it preserves laziness for the 
user to observe. In contrast to our stepper, the GHCi debugger does 
not use a substitution semantics, and only displays a few lines of 
the program at a time, resulting in frequent jumps from the body 
of a function to different call sites, when arguments need to be 
evaluated. This jumping can be confusing for a programmer to 
follow, especially a novice to lazy evaluation. Also, sharing is not 
easily observed in the GHCi debugger. All thunks are rendered the 
same way (as a "_" character), and when stepping, the debugger 
skips over the reduction of a variable, so it is occasionally unclear 
which argument is being evaluated. 

Hat 2.0 jChitil et al.|[2003] |Wallace et aT1[200TT > is a suite of 
debugging tools for GHC that aggregates and improves on several 
previous tools: Hat 1.0 (S parud and Runciman|19 97), Hood ( |Gill| 
2000), and Freja (Nilsson 1998). Hat's implementation resembles 
the implementation of our stepper in that it works by transforming a 
source program into an annotated one, which, when run, generates 
trace information. The generated trace is then interpreted by the 
tools. In Hat, the generated trace can be viewed in several different 
styles: as a directed graph of expressions connected according to 
redex-contractum relations (like the original Hat), in the question- 
answer style of an an algorithmic debugger (like Freja), or by track- 
ing specific values in a computation (like Hood). The graph viewer 
is somewhat similar to our tool; however, program evaluation is 
portrayed using graph semantics, and when viewing the graph, it 
can be difficult to deduce the order of reductions for graphs larger 
than a few nodes and links. Hat also includes a viewer, hat-stack, 
that shows a simulated eager evaluation stack, similar to Allwood's 
tool. Hat is no longer maintained and it is not clear if it still works 
with the more recent version of GHC. 

Finally, only one of these tools come with formal models and 
correctness proofs for their architecture. Chitil and Luo ( |Chitil and| 
|Luo|20 06 1 developed a model for Hat's trace generator and show 
that the evaluation steps can be reconstructed from the information 
in the traces. 

8. Conclusion 

We have presented a lazy stepper as an additional tool for the lazy 
language debugging arsenal. The stepper presents computation as 
the standard rewriting sequence of a novel lazy semantics. While 
the stepper is implemented in Lazy Racket via a "macro" over the 



existing stepper for strict Racket, our paper explains the general 
software architecture via a generic theoretical model. We conjec- 
ture that it is straightforward to construct a stepper on top of other 
architectures, like Allwood's StackTrace. 
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